fodio API

FodioObj

class fodio.FodioObj[source]

Bases: object

This class does literally nothing. It’s inherited by Item and ItemAttr, so ItemMeta knows what to deal with.

ItemAttr

class fodio.ItemAttr(css_selector: str, accept_multiples: bool = False, raise_not_found: bool = True)[source]

Bases: fodio.FodioObj

This is the parent class of all other Attr classes.

Variables:
  • selector – The value passed in as css_selector
  • accept_multiples – The value passed in as accept_multiples
  • raise_not_found – The value passed in as raise_not_found

TextAttr

class fodio.TextAttr(css_selector: str, accept_multiples: bool = False, raise_not_found: bool = True)[source]

Bases: fodio.ItemAttr

Finds the the text from a css selector.

load(document: str) → Union[List[str], NoneType, str][source]

Get the text from the matched contents from _find.

Parameters:document (str) – The HTML relative to the item.
Returns:Either a list of strings (if accept_multiples), None (if not raise_not_found) or a string.
Return type:Union[List[str], None, str]

LinkAttr

class fodio.LinkAttr(css_selector: str, accept_multiples: bool = False, raise_not_found: bool = True)[source]

Bases: fodio.ItemAttr

Finds the text in an a tag, along side it’s href attribute based on the css selector

Variables:LINK – A named tuple representing the “link”. contains .text and .url

alias of Link

load(document: str) → Union[List[<function NamedTuple at 0x7f50be169c80>], NoneType, <function NamedTuple at 0x7f50be169c80>][source]

Get the link from the matched contents from _find.

Parameters:document (str) – The HTML relative to the item.
Returns:Either a list of LinkAttr.LINK (if accept_multiples), None (if not raise_not_found) or a LinkAttr.LINK.

CustomAttr

class fodio.CustomAttr(attrs: Iterable[str], css_selector: str, accept_multiples: bool = False, raise_not_found: bool = True)[source]

Bases: fodio.ItemAttr

This is a ItemAttr in which you can obtain any of a node’s attributes. If raise_not_found is False, if it can’t find an attribute on the node, the value will be None instead,

Variables:value – An Iterable containing the values passed into attrs
load(document: str) → Union[List[Union[dict, NoneType]], NoneType, dict][source]

Get the node’s attribues based on the document.

Parameters:document (str) – The HTML relative to the css_selector.
Returns:Either nothing if not taise_not_found, or a dict / list of dicts with the attr names -> attr values.
Return type:Union[List[Union[dict, None]], None, dict]

ItemMeta

class fodio.ItemMeta[source]

Bases: type

Add all class variables that inherit FodioObj to a _ATTRS class variables

Item

class fodio.Item[source]

Bases: fodio.FodioObj

An object to represent data on a page. To use, create class variables with Attr objects pointed at the data you desire. They will all share the first css selector passed in by the page class.

It’s also important to note that you MUST INCLUDE A META CLASS. For example,

>>> class SomeSite(Item):
...     ...
...     class Meta:
...         selector = ".hello-there"
...         root_url = "https://some.site"
Variables:_ATTRS – A list containing the names for the ItemAttrs.
classmethod from_html(document: str) → Union[Dict[str, Any], List[Dict[str, Any]]][source]

Load a HTML document for parsing, and a shared the selected segment with ItemAttrs.

Parameters:document (str) – The whole HTML page.
Returns:A dict with the keys as class var names to the ItemAttrs, and values as the parsed data. This will be a list if multiple entries for the Meta selector are found.
Return type:Union[Dict[str, Any], List[Dict[str, Any]]]
classmethod load(document: str) → Dict[str, Any][source]

Shorthand for from_html. Mainly used to make item objects compatible as ItemAttrs.

See from_html for more information.

classmethod search(url_path: str) → Dict[str, Any][source]

Fetch for the URL and parse the document based on the items.

Parameters:url_path (str) – The URL path in the Meta.root_url site to parse.
Returns:A dict with the keys being the class variables and values as the loaded items.
Return type:Dict[str, Item]